Building an International Corpus of Arabic (ICA): Progress of Compilation Stage

نویسندگان

  • Sameh Alansary
  • Magdy Nagi
  • Noha Adly
چکیده

This paper focuses on three axes. The first axis gives a survey of the importance of corpora in language studies e.g. lexicography, grammar, semantics, Natural Language Processing and other areas. The second axis demonstrates how the Arabic language lacks textual resources, such as corpora and tools for corpus analysis and the effected of this lack on the quality of Arabic language applications. There are rarely successful trials in compiling Arabic corpora, therefore, the third axis presents the technical design of the International Corpus of Arabic (ICA), a newly established representative corpus of Arabic that is intended to cover the Arabic language as being used all over the Arab world. The corpus is planned to support various Arabic studies that depends on authentic data, in addition to building Arabic Natural Language Processing Applications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The International Corpus of Arabic: Compilation, Analysis and Evaluation

This paper focuses on a project for building the first International Corpus of Arabic (ICA). It is planned to contain 100 million analyzed tokens with an interface which allows users to interact with the corpus data in a number of ways [ICA website]. ICA is a representative corpus of Arabic that has been initiated in 2006, it is intended to cover the Modern Standard Arabic (MSA) language as bei...

متن کامل

Towards Analyzing the International Corpus of Arabic ( ICA ) : Progress of Morphological Stage

his paper sheds light on four axes. The first axis deals with the levels of corpus analysis e.g. morphological analysis, lexical analysis, syntactic analysis and semantic analysis. The second axis captures some attempts of Arabic corpora analysis. The third axis demonstrates different available tools for Arabic morphological analysis (Xerox, Tim Buckwalter, Sakhr and RDI). The fourth axis is th...

متن کامل

An Analysis of Cultural Factors Affecting the Design and Compilation of Islamic-Iranian Model of Progress (with an Emphasis on Moderation in Decisions and Policies)

Following the victory of the Islamic Revolution of Iran and establishment of new government within the framework of political Islam in the region, in order to continue and reinforce this government model, the Islamic –Iranian Model of Progress was put in the blueprint with regard to the indigenous standards as the most important pre-occupation. After the third decade of the Revolution which has...

متن کامل

Entrepreneurship Approaches in Agricultural Cooperatives

Economists look at the entrepreneurship from the prospective of profitability, investment, risk, and insight supporting the economical development but it seems that, in modern societies, entrepreneurship`s function is beyond the economical bounds. According to the stage and importance of entrepreneurship which is known as the society’s economic and culture evolution engine, it is necessary to d...

متن کامل

Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents

Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007